Cluster_2_of_6_D90

Bayesian Identification of HLA Allotypes from Gibbs Sampling Motifs

Analysis Date March 10, 2025

Executive Summary

Unknown Dataset
HLA_A0101
Top HLA Match
0.694
Posterior Probability
1.0
Bayes Factor
0.121
p-value (adjusted)

This report presents the results of a Bayesian analysis comparing a Gibbs-sampled motif from dataset Unknown Dataset against a reference database of HLA allotype motifs. The analysis identified HLA_A0101 as the most probable match with high confidence. This suggests that the input peptide data likely represents binding specificities consistent with this HLA allotype.

Key Visualizations

PSSM Comparison: Reference vs. Gibbs Motifs
PSSM Comparison Visualization

This visualization compares the Position-Specific Scoring Matrix (PSSM) of the top-matched HLA allotype (HLA_A0101) with the Gibbs-derived motif. The comparison includes log-odds scores, information content, positional Gini coefficients, and similarity metrics.

Bayesian Analysis of HLA Allotype Matches
Bayesian Analysis Visualization

This visualization shows the Bayesian analysis workflow including prior probabilities, likelihood values based on PSSM similarity, and resulting posterior probabilities for each HLA allotype. The top panel shows the flow of Bayesian analysis, while the bar charts show the comparative values for the top allotype matches.

HLA Allotype Network Analysis
Network Visualization

This interactive network visualization shows the relationships between the Gibbs-derived motif and HLA allotypes. Node sizes represent posterior probabilities, with larger nodes indicating stronger matches. Edges represent the strength of the connection between the Gibbs motif and each allotype. Colors represent different HLA classes.

HLA-A HLA-B HLA-C

HLA Allotype Matches Data

Detailed Results
Rank Allotype Prior Likelihood Posterior Bayes Factor P-value Adjusted P Significant

Methodology

PSSM Analysis

Position-Specific Scoring Matrices (PSSMs) capture amino acid preferences at each position of HLA binding motifs. We compare Gibbs-derived motifs with reference HLA allotype motifs using information theory metrics.

Bayesian Inference

We apply Bayesian statistics to calculate the probability that a given Gibbs motif represents each HLA allotype, incorporating prior population frequencies and likelihood based on motif similarity.

Statistical Significance

We assess statistical significance through permutation testing and correct for multiple comparisons using the Benjamini-Hochberg procedure to control false discovery rate.

Data Sources
  • Gibbs Matrix:
  • Reference Allotypes:
  • Permutations: 1000 for statistical significance
Metrics Used
  • Similarity: Jensen-Shannon distance
  • Information Content: Shannon entropy-based
  • Specificity: Gini coefficient
Performance Metrics
References
  • 1. Nielsen M, et al. (2018). NetMHCpan-4.0: Improved Peptide–MHC Class I Interaction Predictions Integrating Eluted Ligand and Peptide Binding Affinity Data. J Immunol, 199(9), 3360-3368.
  • 2. Andreatta M, Nielsen M. (2016). Gapped Sequence Alignment Using Artificial Neural Networks: Application to the MHC Class I System. Bioinformatics, 32(4), 511-517.
  • 3. O'Donnell TJ, et al. (2018). MHCflurry: Open-Source Class I MHC Binding Affinity Prediction. Cell Syst, 7(1), 129-132.e4.
  • 4. Shen WJ, et al. (2014). Bayesian Model Selection for Beta-Binomial Regression with Application to Homology Detection. Int J Data Min Bioinform, 10(4), 376-400.
  • 5. Andreatta M, et al. (2017). An Automated Benchmarking Platform for MHC Class II Binding Prediction Methods. Bioinformatics, 33(21), 3315-3322.
  • 6. Vita R, et al. (2019). The Immune Epitope Database (IEDB): 2018 Update. Nucleic Acids Res, 47(D1), D339-D343.